Data Lineage: A Survey
نویسندگان
چکیده
Lineage, or provenance, in its most general form describes where data came from, how it was derived, and how it was updated over time. Information management systems today exploit lineage in tasks ranging from data verification in curated databases [1] to confidence computation in probabilistic databases [10, 12]. Here, we formalize and categorize lineage, discuss a set of selected papers, and then identify open problems in lineage research. Lineage can be useful in a variety of settings. For example, molecular biology databases, which mostly store copied data, can use lineage to verify the copied data by tracking the original sources [1]. Data warehouses can use the lineage of anomalous view data to identify faulty base data [4], and probabilistic databases can exploit lineage for confidence computation [10, 12]. Although lineage can be very valuable for applications, storing and querying lineage can be expensive; for instance the Gene Ontology database has up to 10MB of lineage for single tuples [10]. Approaches have been developed to lower these costs. For example, in probabilistic databases, approximate lineage can compress the complete lineage by up to two orders of magnitude while allowing a selected set of queries over the lineage to be answered efficiently with low error [10]. In curated databases, storing a type of lineage called hierarchical transactional provenance can reduce the storage overhead by a factor of 5, relative to a more naive approach [1]. In addition to challenges related to space and time efficiency, it can be difficult even to define lineage in domains that allow arbitrary transformations. The most commonly considered transformations are relational queries [4, 9, 10, 12], but some papers have studied lineage for a broad range of transformations. For example, in data warehousing, tracing procedures have been developed for general transformations that take advantage of transformation properties specified by the transformation definer [3]. In curated databases, lineage for copying operations has been studied [1]. For a generalization of standard relational queries called DQL (Deterministic QL), definitions for lineage that are invariant under query rewriting have been proposed [2].
منابع مشابه
The Lineage of Children Born by Sperm Donation: A Shiite Perspective
Background: Despite the meager role of the masculine agent in infertility (the low number of infertile men than women infertile), there are men whose wives are unable to become pregnant due to the absence of sperm, decreased numbers of sperm or lack of sufficient motile sperm. Utilizing donated sperm is a method that enables these families to have children. The use of this method prompts us to ...
متن کاملA Legal Jurisprudential Deliberation on Lineage and Inheritance of the Pre-Implantation Embryo
متن کامل
Lineage Switch in Childhood Leukemia: A Case Report and Review of Literature
Acute leukemia which is the most common cancer in children is a heterogeneous group of clonal malignancies. The conversion of the leukemic cell lineage during the course of the disease or later is termed lineage switch. It has been rarely reported in the literature. In leukemia lineage switch, conversions from lymphoblastic leukemia to myeloid leukemia or vice versa are reported. Herein, we rep...
متن کاملSignificance of Cross Lineage Antigen Expression in Acute Lymphoblastic Leukemia
Background: Aberrant expression of cross-lineage antigens gives valuable insight into the diagnosis and prognosis of acute leukemia. In countries like India, cytogenetic tests are widely accessible. Exploring the prognostic value of an accessible test is of great importance. Therefore, establishing a population-specific immunophenotype database will enable to design an antibody panel equipped t...
متن کاملO-44: Characterisation of Monotreme CaseinsReveals Lineage Specific Expansion of an AncestralCasein Locus in Mammals
Background: One important reproductive characteristic of Mammals is the production of milk to nurse the neonate. In order to better understand the evolution of milk we have investigated gene expression in milk cells from monotremes which are the most ancient representative of the mammalian lineage. Materials and Methods: Using a milk cell cDNA sequencing approach we characterise milk protein se...
متن کامل